Temporal Convolution Network Based Joint Optimization of Acoustic-to-Articulatory Inversion

نویسندگان

چکیده

Articulatory features are proved to be efficient in the area of speech recognition and synthesis. However, acquiring articulatory has always been a difficult research hotspot. A lightweight accurate model is significant meaning. In this study, we propose novel temporal convolution network-based acoustic-to-articulatory inversion system. The acoustic feature converted into high-dimensional hidden space map through with frame-level correlations taken account. Meanwhile, construct two-part target function combining prediction’s Root Mean Square Error (RMSE) sequences’ Pearson Correlation Coefficient (PCC) jointly optimize performance specific from both aspects. We also further conducted an analysis on impact weight between two parts final model. Extensive experiments have shown that our, networks (TCN) outperformed Bi-derectional Long Short Term Memory by 1.18 mm RMSE 0.845 PCC 14 parameters when optimizing evenly

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic to articulatory inversion

The context of this work is speech analysis. The subject deals with acoustic-to-articulatory inversion, i.e. the recovery of the temporal evolution of the vocal tract shape from the signal. This topic is important because it is likely to give rise to applications in the domains of speech coding as well as second language learning. Acoustic-to-articulatory inversion relies on an analysis by synt...

متن کامل

Formant trajectories for acoustic-to-articulatory inversion

This work examines the utility of formant frequencies and their energies in acoustic-to-articulatory inversion. For this purpose, formant frequencies and formant spectral amplitudes are automatically estimated from audio, and are treated as observations for the purpose of estimating electromagnetic articulography (EMA) coil positions. A mixture Gaussian regression model with mel-frequency cepst...

متن کامل

Generalized variable parameter HMMs based acoustic-to-articulatory inversion

Acoustic-to-articulatory inversion is useful for a range of related research areas including language learning, speech production, speech coding, speech recognition and speech synthesis. HMM-based generative modelling methods and DNNbased approaches have become dominant approaches in recent years. In this paper, a novel acoustic-to-articulatory inversion technique based on generalized variable ...

متن کامل

Jerk Minimization for Acoustic-To-Articulatory Inversion

The effortless speech production in humans requires coordinated movements of the articulators such as lips, tongue, jaw, velum, etc. Therefore, measured trajectories obtained are smooth and slowly-varying. However, the trajectories estimated from acoustic-to-articulatory inversion (AAI) are found to be jagged. Thus, energy minimization is used as smoothness constraint for improving performance ...

متن کامل

Acoustic-to-articulatory inversion in speech based on statistical models

Two speech inversion methods are implemented and compared. In the first, multistream Hidden Markov Models (HMMs) of phonemes are jointly trained from synchronous streams of articulatory data acquired by EMA and speech spectral parameters; an acoustic recognition system uses the acoustic part of the HMMs to deliver a phoneme chain and the states durations; this information is then used by a traj...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2021

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app11199056